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Abstract. An accelerated polynomial expansion scheme to construct the den- 
sity matrix in quantum mechanical molecular dynamics simulations is pro- 
posed. The scheme is based on recursive density matrix expansions, e.g. [Phys. 
Rev. B. 66 (2002), p. 155115], which are accelerated by a scale-and-fold tech- 
nique [J. Chem. Theory Comput. 7 (2011), p. 1233]. The acceleration scheme 
requires interior eigenvalue estimates, which may be expensive and cumber- 
some to come by. Here we show how such eigenvalue estimates can be extracted 
from the recursive expansion by a simple and robust procedure at a negligi- 
ble computational cost. Our method is illustrated with density functional 
tight-binding Born— Oppenheimer molecular dynamics simulations, where the 
computational effort is dominated by the density matrix construction. In our 
analysis we identify two different phases of the recursive polynomial expansion, 
the conditioning and purification phases, and we show that the acceleration 
represents an improvement of the conditioning phase, which typically gives a 
significant reduction of the computational cost. 



1. Introduction 

With the fast growth of computational processing power, atomistic simulations 
based on quantum mechanical calculations of the electronic structure have become 
a powerful approach to the study of a broad range of problems in materials science, 
chemistry, and biology [26l [33l [64] . Nevertheless, the computational cost associated 
with electronic structure calculations normally limits applications to fairly small 
systems. In particular, the cubic, 0{N^), scaling of the computational cost as 
a function of the number of atoms, TV, for the regular solution of the quantum 
mechanical eigenvalue problem is considered to be a most limiting factor. A number 
of different electronic structure technologies have therefore been developed that 
circumvent this bottleneck with a computational effort that scales only linearly with 
the system size [6l [18] . The reduction in the cost is typically achieved by utilizing 
sparse matrix algebra in an iterative construction of the density matrix, which 
avoids the full regular solution of the quantum mechanical eigenvalue problem. 
The matrix sparsity arise in localized atomic basis set representations due to the 
short-range character of the electronic wavefunctions for non-metallic materials 
O [22l [28] . With linear scaling techniques it is today possible to perform electronic 
structure calculations involving millions of atoms [SJ [5S] . 

An ideal linear scaling method should combine several important properties: i) 
it should be well tailored for high-performance on heterogeneous multi-core archi- 
tectures, ii) require little intermediate memory storage, in) allow well controlled 
numerical accuracy, and iv) have a low computational prefactor. Among several 
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linear scaling techniques [2ll4l[l0l[ni[22l[24l[25l|3ll[Mlll5],the recursive second- 
order spectral projection method (SP2) [37], which sometimes also is referred to 
as the trace-correcting purification algorithm, represents a surprisingly simple and 
efficient choice [57^ . The SP2 algorithm allows for control of errors arising from an 
approximate sparse matrix algebra [411 1541 155j and it has a low computational pref- 
actor. Even in the dense (non-sparse) limit the SP2 algorithm outperforms regular 
lapack diagonalization techniques on multi-core platforms, both with respect to 
speed and accuracy [9] . The computational cost of the SP2 algorithm is dominated 
by a sequence of matrix-matrix multiplications; typically between 20 to 40 multipli- 
cations are needed. Highly optimized linear algebra subroutines can perform dense 
matrix-matrix multiplications with close to peak performance on multi-core plat- 
forms. Thus, even without utilizing linear scaling sparse matrix algebra, the SP2 
algorithm has excellent performance and is well suited for modern computational 
architectures [5]. 

Recently Rubensson proposed an acceleration technique that can further boost 
the performance of the SP2 algorithm, as well as other recursive polynomial ex- 
pansion schemes [52]. The acceleration is based on the idea to allow for nonmono- 
tonicity in the interval of interest, which gives additional flexibility in the choice 
of recursive expansion polynomials. By the use of a scale and fold technique, i.e. 
by shifting and re-scaling the SP2 projection polynomials, the number of matrix- 
matrix multiplications required by the SP2 algorithm is significantly reduced. This 
boosting technique requires estimates of interior eigenvalues that may be expensive 
and cumbersome to come by. In this article we show how such eigenvalue estimates 
can be extracted by a simple procedure from the recursive polynomial expansion 
at negligible computational cost. 

Our scheme is particularly useful in molecular dynamics simulations. A molec- 
ular dynamics simulation often involves hundreds of thousands of time steps and 
therefore requires a low computational cost for the calculation of each new config- 
uration. This is provided by the accelerated SP2 algorithm where eigenvalue esti- 
mates that are necessary for the acceleration can be extracted from previous time 
steps at practically no extra cost. The scheme is illustrated with density functional 
tight-binding molecular dynamics simulations, where the computational effort is 
dominated by the construction of the electronic density matrix through the recur- 
sive polynomial expansion. All the simulations are performed using dense matrix al- 
gebra in the computational framework of extended Lagrangian Born-Oppenheimer 
molecular dynamics [381 161] , which provides accurate and stable molecular trajec- 
tories with long-term energy conservation. 

The article is outlined as follows; first we describe the problem and the recursive 
SP2 expansion of the density matrix that is used iteratively in the self-consistent 
calculation of the electronic ground state. Thereafter the acceleration technique is 
presented and we show how the required interior eigenvalues can be estimated from 
the recursion. The article goes on with a presentation of the extended Lagrangian 
formulation of Born-Oppenheimer molecular dynamics and we illustrate the be- 
havior of our scheme in some quantum mechanical molecular dynamics simulations 
before we end with a discussion and some concluding remarks. 
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2. The density matrix 

The single-particle density matrix at zero electronic temperature can be defined 
as a matrix function of the effective Hamiltonian matrix, 

(1) D = e{tii-H), 

where 9{x) is the Heaviside step function and is the chemical potential or Fermi 
level. We restrict our applications to non-metallic materials and assume that H 
does not have degenerate eigenvalues at /j,. We further assume that all matrices 
are Hermitian. We will refer to the smallest eigenvalue above /i as the lowest 
unoccupied molecular orbital (lumo) eigenvalue and to the largest eigenvalue below 
/i as the highest occupied molecular orbital (homo) eigenvalue. The gap between 
the homo and lumo eigenvalues is referred to as the homo-lumo gap. See Figure [l] 
for an illustration. 

The density matrix is the matrix for orthogonal projection onto the occupied 
subspace spanned by the eigenvectors of H that correspond to eigenvalues smaller 
than /X. The dimension of the occupied subspace is equal to the number of occupied 
electron orbitals, iVocc- Usually, the number of electrons is given as input to the 
program which has to automatically adjust the chemical potential so that a correct 
occupation number is achieved. 

In general, the method of choice for computation of matrix functions of sym- 
metric matrices is diagonalization [201 P- 84]. In short, an eigendecomposition 
H = VhV'^ is computed (with A diagonal) and the matrix function can then be 
computed as f{H) = V f{h)V'^ . This is the standard method to solve ([T]) in the 
context of electronic structure calculations. 

Since 13 is a projection matrix, an eigendecomposition, which possesses infor- 
mation about all eigenpairs of H, is not really needed to form D. Any basis of 
the occupied subspace could be used to construct D. Important for the accuracy 
however is to resolve the step at the chemical potential. In other words, while 
rotations of the eigenvectors within the occupied (or unoccupied) subspace do not 
affect the result at all, rotations corresponding to a leakage between the occupied 
and unoccupied subspaces do affect the accuracy, and are more likely to occur be- 
tween eigenvectors corresponding to eigenvalues around the chemical potential. One 
would therefore expect the problem to become more difficult when the homo-lumo 
gap decreases. This is reflected by the condition number for the problem. 

An appropriate condition number for the problem of computing the density 
matrix can be defined as 

\\^{^JiI-{H + hA))-e{llI-H)\\ 

(2) K = lim sup , 

'l^Oyl:||A||=Ae 

which can be evaluated to k — Ae/^, where Ae = Amax — A„iin is the spectral width 
and ^ is the homo-lumo gap of H [53l [55] . In the assessment of density matrix 
methods, it is therefore important to consider how the computational cost scales 
with the homo-lumo gap. 

2.1. The self-consistent field procedure. In electronic structure methods such 
as Hartree-Fock [521 IHII or density functional theory [3T1 , the effective single- 
particle Hamiltonian matrix depends on the electronic configuration, which is de- 
termined by the density matrix. If the density matrix used to construct the Hamil- 
tonian matrix is equal to the density matrix given by (ll]), we say that the solution 
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Figure 1 . Eigenvalues of the effective Hamiltonian matrix H cor- 
responding to occupied (red circles) and unoccupied (blue crosses) 
orbitals. (A|^^j„^„, \^^^^) and (a{^)„„, a{^^^J are intervals contain- 
ing the homo and lumo eigenvalues, respectively. The chemical 
potential /i is a number between Ahomo and Aiumo- Amin and Amax 
are either the extremal eigenvalues or lower and upper bounds 
thereof. We will refer to X^^^^^ and a|^^q as inner hounds, and 
^homo and a[^^„ as outer hounds. 



is self-consistent. Finding the correct self-consistent electronic ground state there- 
fore requires an iterative procedure, with repeated computation of Hamiltonian and 
density matrices, 

(3) Do^ Hi^ Di^ .... 

In its simplest form, this self-consistent field procedure is a fixed-point iteration, 
where in each iteration the most recent density or Hamiltonian matrix is used as 
input to the next step. Usually, however, the Hamiltonian matrix is in each step 
taken as a linear combination of the most recent and previous Hamiltonian matrices 
to reduce the number of iterations needed to reach convergence. Popular mixing 
schemes include simple linear mixing, the direct inversion in the iterative subspace 
(DHS) method H^, and Broyden mixing [71125]. 

In Born-Oppenheimer molecular dynamics simulations [33j . the self-consistent 
field procedure is employed in each time step, as the electronic ground state solu- 
tion is needed in the computation of the interatomic forces. In the context of the 
present work it docs not matter if the sequence of matrices in ([3| is generated by 
a self-consistent field optimization or if it is coming from molecular dynamics time 
stepping, as long as two successive Hamiltonians are reasonably similar to each 
other. The important point is that there is some procedure generating a sequence 
of Hamiltonian matrices Hi , H2 , . . . for which the corresponding density matrices 
need to be computed via (fTl). 



3. Recursive polynomial expansion of the density matrix 

There are several alternatives to the eigenvalue decomposition for the calculation 
of the matrix step function in ([T]) . In the context of linear scaling electronic struc- 
ture theory some of the first techniques were based on serial Chebyshev expansions 
[m Uni EH EO] , where the sparsity of the Hamiltonian can be used efficiently to 
achieve linear scaling complexity in the calculation of the Chebyshev polynomials 
through their recurrence relation. The degree of the Chebyshev polynomial required 
to reach a certain accuracy is proportional to k — Ae/^ [TH]. The computational 
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cost therefore scales as 0{k) or at best 0(^/K) if the niatrix polynomial is explicitly 
evaluated with the Paterson-Stockmeyer method [351 |3H] . 

A difficulty with the Chebyshev techniques is that the chemical potential in ([!]) 
needs to be known in advance or has to be adjusted iteratively in order to achieve 
the correct occupation number. Falser and Manolopoulos proposed a scheme |46j 
that overcomes this problem through a recursive expansion based on the third-order 
purification polynomial ~ 2X^ that originally was designed by McWeeny to 
adjust the fractional occupation of approximate density matrices to represent pure 
ensembles (351 ISS]- In the Palser-Manolopoulos (PM) scheme (as well as in the 
original purification method by McWeeny) the density matrix is given through a 
rapid recursive expansion of the Heaviside step function where 



By adjusting Xq = fo{H) such that Xq has the correct trace, i.e. the desired 
occupation, A^occj the successive McWeeny polynomials, /„(X„-i), can be modified 
to preserve the trace. In this trace conserving (or canonical) "purification" scheme 
the step is automatically formed at the chemical potential and no prior knowledge of 
the chemical potential is required. By using sparse matrix algebra each polynomial 
recursion Xn — /n(X„_i) can be calculated with linear scaling effort. 

A problem with the Palser-Manolopoulos scheme is that it is slow at low and 
high occupation, i. e. when Nocc/N is close to or I. A solution to this problem 
was offered by the recursive SP2 expansion scheme by Niklasson |37J. Instead of 
using third (or higher) order polynomials that simultaneously project the occupied 
eigenvalue spectrum of H towards 1 and the unoccupied towards 0, the SP2 scheme 
uses a combination of two second order spectral projection polynomials, 



that either projects the eigenvalues towards 1 for /„(X) = 2X — X^ or towards 
for fn{X) = X^, see Figure [2j It is easy to see that any recursive combination 
of these two polynomials converges to a step function with the step formed in the 
interval [0,1]. Moreover, since the trace of X^ is always smaller than the trace 
of X, given that all eigenvalues of X are in [0,1], and vice versa for 2X — X^, 
we can choose between the two polynomials to iteratively correct the occupation 
such that the trace of the converged expansion automatically has the desired value. 
The occupation correcting SP2 algorithm is described in some detail in Alg. [T] 
After an initial normalization, where AJf^a,x/min ^re upper and lower bounds of the 
eigenspectrum of H, all eigenvalues of X are in reverse order in the interval [0, 1]. 
Thereafter we apply the spectral projection polynomials that are chosen to achieve 
the correct occupation at convergence. Prior knowledge of the chemical potential 
or the homo-lumo gap is not required. 

Linear scaling computational complexity in density matrix methods is usually 
achieved by neglecting matrix elements deemed not to contribute significantly to 
the overall accuracy [54j . For recursive polynomial expansions on the form Q the 
removal of matrix elements can be done in such a way that given a tolerance e, an 
accuracy [ID — -D|| < e is guaranteed [SUES]. Here, D and D are the exact and 
approximate density matrices respectively, and || • || is a unitary invariant norm. 

The polynomial expansion order increases very rapidly in the SP2 recursion. 
After only 30 matrix-matrix multiplications the expansion order is over 1 billion. 



(4) 



e{fiI-H)= lim /„(/„-!(... /o(i?)...))- 



(5) 
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and the number of required multiplications increases only with the logarithm of the 
condition number, i. e. as ©(logK) [37 . Furthermore, the SP2 method performs 
well wherever the chemical potential may be located. In fact, the SP2 method is 
at its best at low and high occupation numbers. Also, the SP2 method is very 
memory-efficient as it requires a total of only two symmetric matrices in memory. 

Algorithm 1 Trace-correcting 2nd-ordcr spectral projection expansion (SP2) 



for i = 1, 2, . . . , n do 

if |Tr(X2) - N,,,\ < \Tt{2X - X^) - N,,,\ then 

X^X^ 
else 

X ^2X -X^ 
end if 
end for 
return X 



4. Acceleration 

The recursive expansion functions in Q are usually taken as polynomials that 
have fixed points at and 1, vanishing derivatives at and/or 1, and that are 
monotonically increasing in the [0, 1] interval. Such polynomials are very good for 
bringing a near-idempotent matrix closer to idempotency. For the SP2 polynomials, 
for example, the asymptotic convergence rate is quadratic. However, the initial 
matrix is typically far from being idempotent, and most of the work in the recursive 
expansion is therefore spent on bringing the matrix near idempotency. For this 
task polynomials on the form described above are not optimal. By allowing for 
non-monotonicity in the projection polynomials the convergence can be boosted 

The basic idea behind the acceleration technique of the recursive polynomial 
expansion scheme is that prior knowledge of the eigenvalue distribution allows for a 
more optimized design of the projection polynomials that boosts convergence |52j. 
What is needed is, more precisely, estimates of the homo and lumo eigenvalues 
Ahomo and Aiumo in Figure [T] If such estimates are available, the eigenspectrum 
can in each step be shifted and scaled so that the projection polynomials (X^ 
or 2A" — A"^) fold the eigenspectrum, which results in a more rapid convergence. 
Effectively, the SP2 projection polynomials are shifted and rescaled as illustrated 
in Figure [2j The accelerated trace-correcting expansion is given by Alg. [2j 

Too aggressive scaling can lead to the occupied and unoccupied eigenstates being 
mixed up. Therefore, a lower bound of the homo eigenvalue and an upper bound of 
the lumo eigenvalue, aIj^j^^^ and X^^l^^ in Figure [l] are used in the accelerated algo- 
rithm. This makes the algorithm robust with respect to the homo-lumo estimates. 
If only loose bounds of the homo and lumo eigenvalues are available, the acceler- 
ation can still be used. As the algorithm is formulated, loose bounds will never 
lead to reduced efficiency or accuracy compared to the original SP2 algorithm, but 
tighter bounds result in a more efficient acceleration. 

The process of bringing the matrix to idempotency can be seen as consisting 
of two different phases, that we shall refer to as the conditioning phase and the 
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Figure 2. The regular SP2 projection polynomials X'^ and 2X — 
(solid lines). The shifted and rescaled polynomials used in the 
accelerated recursive expansion, {{l — a)I+aXY and 2aX — {aXY 
(dashed lines). Here, the scaling parameter was taken as a = 
1.3. In practice, this parameter is adjusted during the course of 
the recursive expansion to give the best possible acceleration, see 
Alg.i 



purification phase. In the first phase the idempotency error is not reduced, 

but the condition number of the problem is lowered. Note that in each recursive 
expansion step, the problem that remains to be solved can be seen as an independent 
matrix step function problem with an associated condition number. The recursive 
expansion procedure may then be seen as a procedure to reduce the condition 
number to 1. When the condition number is close to 1, and the conditioning can 
no longer be substantially improved, we enter the purification phase in which the 
idempotency error is reduced. This behavior is illustrated in Figure [3] 

It is clear that the acceleration represents an improvement of the conditioning 
phase which can be explained by a steeper slope of the projection polynomials 
at the chemical potential |37j, where the occupied states are separated from the 
unoccupied states. As the condition number comes close to 1, and we enter the 
purification phase, the acceleration parameter ai in Alg.|2] comes close to 1 and the 
method becomes essentially equivalent to the original method without acceleration. 
Thus, the acceleration is automatically turned off when the purification phase is 
reached and the quadratic convergence of the SP2 method sets in. 
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Algorithm 2 Accelerated expansion (SP2+ACC) 



Amax — '^min '^max — '^min 

3: for i = 1, 2, . . . , n do 

4: if |Tr(A2) - Aoecl < |Tr(2A - A^) - Aocd then 

5: Pi = l 

6: a, = 2/(2 - X2) 

7: X = {1 - ai)I + ttiX 





(shift and scale) 



A = A2 



Xfe (1 - Qfi) + aiXk, 



10: else 

11; p, = 

12: a, = 2/(l + .Ti) 

13: X = A 

14: A = 2A - A2 



(scale) 



Xk — 2xfc x^, /c = 1, 2 
16: end if 
17: = ||A- A^ll;. 

18: = Tr(A - A2) 

19: end for 
20: return A 



The drawback with the acceleration technique is that estimates of interior eigen- 
values are needed. Standard iterative methods, such as e.g. the Lanczos method, 
provide rapid convergence to well separated extremal eigenvalues. When interior 
eigenvalues are to be computed, spectral transformations are typically employed to 
move the desired eigenvalues to the ends of the eigenspectrum. Transformations 
include shift-and-invert operators and various polynomial filters, see e.g. [Tl [Tl]. 
The computational overhead that would be incurred to the recursive expansion by 
incorporating iterative eigenvalue methods depends on several factors such as ma- 
trix sparsity and hardware and software details. In many cases the computational 
overhead would be significant. However, in the context of recursive polynomial 
expansions one may take advantage of the formation of a step at the chemical 
potential. The recursive expansion leads to extremely good separation between 
eigenvalues around the chemical potential which can be utilized to drastically re- 
duce the number of Lanczos iterations ^6|. Nevertheless, even if it is possible to 
implement such schemes with good efficiency, the use of any iterative method for 
eigenvalue calculation would add to the complexity of the method. 

Eigenvalue estimates can also be cheaply computed directly from the entries of 
the matrix, for example on the basis of Gershgorin's circle theorem. Unfortunately, 
Gershgorin's theorem rarely provides useful information about interior eigenvalues, 
at least not in the context of the present work. 

In this section, we propose a new simple and efficient approach to calculate 
estimates of the homo and lumo eigenvalues when the recursive expansion is used in 



5. Eigenvalue estimates 
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Figure 3. Illustration of the two phases of the recursive expan- 
sion. In the conditioning phase, the condition number k — Ae/^ 
is lowered but the idempotency error \\X ~ X'^\\2 is not reduced. 
When the condition number is close to 1 we have reached the purifi- 
cation phase, where the idempotency error decreases quadratically. 
The test calculations were performed for a test Hamiltonian for 
which the condition number Ae/^ = 1000 and with the chemical 
potential located at ^ = A,nin + 0.3(Aniax — Amin)- The tested meth- 
ods are the trace-conserving or canonical scheme (PM) and 
the trace-correcting scheme with (SP2-ACC) and without (SP2) 
acceleration. 



an outer loop such as for example molecular dynamics time stepping. The proposed 
method consists of two parts. The first is calculation of accurate estimates directly 
from matrix entries in the recursive expansion of the density matrix. The second 
is propagation of eigenvalue estimates between time steps (or SCF iterations). 

5.1. Accurate interior eigenvalue estimates directly from matrix entries. 

The proposed method is based on the following observations. We first recall that 

(6) ^^<\\Ah<\\Ay. 



Now, let {Aj} be the eigenvalues of Xi. Then, 



(7) ||X,-Xf||^= /^(A,-A2)2 
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and 



(8) 



\Xi — XA\2 — 



2 = max I Xj 
j 



The spectral norm in ([s]) is given by the eigenvalue closest to 0.5. In the last 
iterations of the density matrix recursion, this eigenvalue is either the homo or the 
lumo eigenvalue. Either the homo or the lumo eigenvalue can then be calculated via 
calculation of the norm in ([s]). In the last recursive expansion iterations, most of the 
eigenvalues are very close to zero and one, and the sum over all eigenvalues in ^ is 
dominated by the few eigenvalues furthest from convergence. This is illustrated in 
Figure |4] As we get closer to convergence, the Frobenius norm therefore becomes 
a better and better estimate of the spectral norm. The Frobenius norm of Xi — Xf 
in the final iterations can then be used to obtain estimates to the homo and lumo 
eigenvalues. Once the Frobenius norm has been computed, upper and lower bounds 
of \\Xi — Xf\\2, given by (|6|, can be translated to bounds for the homo or lumo 
eigenvalues by taking the inverse of the recursive expansion |56j . This gives us a 
very sharp upper bound. However, for the lower bound in ([6]), we can do better. 

Let r]j = Xj — Xj, j = 1, 2, . . . , TV be the eigenvalues of Xi — Xf. Since rjj > 
0, J = l,2,. 



. ,N, we have that 



(9) 

and since 
(10) 

(11) 

and 
(12) 

we have that 
(13) 



^77j2 < max77j^?7fe 

J ' k 



3 



= \\x,, - X 



\x,, - xi 



Tr(X, - Xt 



2 — max 7]j 

j 



Tr(X, 



-Xf) 



< \\x,, - Xf 



<\\X,~Xfh)m 
Xf arc equal, then the left 



Both the lower bound in ( 13 ) and the lower bound ( 
^ are sharp in the sense that if all eigenvalues of Xi 
hand side equals the right hand side. However, the bound in ( 13 ) is sharp also in 
the sense that if Xi — Xf has only one nonzero eigenvalue, then the left hand side 
equals the right hand side as well. In the present context, many eigenvalues are 



expected to be zero, and we therefore expect the bound in ( 13 ) to be much better. 



Before we can formulate an algorithm we need to address a few issues. The first 
question is which iterations that should be considered to be the last iterations, so 
that we can be sure that we compute the homo or the lumo eigenvalue and not 
some other eigenvalue. 

From Alg.|2] we have v, = \\Xi-Xf\\F- By ^ we have that v, > \\X,-Xf\\2 = 
A — where A is the eigenvalue of Xi that is closest to 0.5. Assume that Vi < 1/4, 
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then we have that the interval 



(14) 



1/1 1/1 



Vi, \- \ Vj 

4 ' 2 V 4 



is free from eigenvalues. 

Assume now that in iteration i + 1, Ahomo > 0.5 and that Aiumo < 0.5, and that 
the interval [7, 1 — 7] is free from eigenvalues, where 7 is a parameter between 
and 0.5. If the interval [7, 1 — 7] is free from eigenvalues also in iteration i and it 
is not possible for an eigenvalue to jump from a value larger than 1 — 7 to a value 
smaller than 7 and vice versa in one iteration, then we know that Ahomo > 0.5 and 
that Aiumo < 0.5 also in iteration i. 

An eigenvalue can only jump from a value larger than 1 — 7 to a value smaller 
than 7 and vice versa in one iteration provided that 7 > 6 - 4a/2 « 0.343146. This 
value is given by solving 

(15) a2A2 + 2a(l-a)A + (l-a)2 =7 

where A = 1 — 7 and a = for 7 e [0, 0.5]. 



Thus, by the above reasoning and specifically by (14), the w^-value can be used 
for eigenvalue estimation provided that 

(16) < 7-7^j = i,i + l,...,n 

with 7 = 6 — 4-\/2. We only use the Wi-values for i that fulfills this criterion. 

It would now be natural to ask the question how to know if it is the homo or 
lumo eigenvalue that dominates the Frobenius norm value in ([t]). Our response to 
this question is that we do not need to know. Regardless of whether the homo or 
lumo eigenvalue dominates the norm value, |Aj — A|| < Vi holds for all eigenvalues. 
Since we know from above that Aiumo < 0.5 and that Ahomo > 0.5, we therefore 
have that Aiumo < |(1 — and that Ahomo > ^(1 + \/T^^4vi). Thus, the 

inner bounds for the homo and lumo eigenvalues always hold provided that the 



7-criterion ( 16 1 is fulfilled. 

Clearly, the same is not true for the outer bounds, see Figure |4] However, in 
each iteration, at least one of the outer bounds will be valid, either for the homo 
or the lumo eigenvalue. This means that provided that each of the homo and 
lumo eigenvalues dominate the norm value in at least one iteration for which the 
7-criterion holds, the most conservative outer bound holds. 

When the polynomial in each iteration is chosen based on the trace of the ma- 
trix, as in Algs. [T] and [2] it can in principle happen that one of the homo or lumo 
eigenvalues dominates the norm value in all the final iterations. This could happen 
if there are many eigenvalues clustered very close to either the homo or the lumo 
eigenvalue. This problem can be avoided by choosing polynomial based on the homo 
and lumo eigenvalue estimates, as in [55], which would also improve convergence. 

Truncation and/or rounding errors cause eigenvalues to be disturbed. This has 
the effect that the eigenvalues at convergence will still deviate from their desired 
values of and 1. Since the homo or the lumo eigenvalue dominates the norm value, 
this only matters in the very last iterations when the homo and lumo eigenvalues 
are very close to 1 and 0, respectively. As above this only leads to too conservative 
inner bounds and that the outer bounds may not hold in every iteration, which can 
be understood from Figure [4] In our algorithm we use the least conservative inner 
bounds and the most conservative outer bounds. 
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5.2. Algorithm for homo and lumo eigenvalues. The algorithm to calculate 
the estimates is described in Alg. [3j For each iteration i that the 7-criterion holds, 
candidates for upper and lower bounds of the homo and lumo eigenvalues of Xi are 
computed. For those values, the recursive expansion is carried out backwards so 
that potential bounds for the homo and lumo eigenvalues of Xi_2, • ■ • , and 
eventually Xq are obtained. Then, an attempt is made to improve the inner bounds 
to make them tighter and the outer bounds to make them more reliable, see the 
discussion above. Finally, when all potential bounds have been tested, the selected 
bounds are translated to bounds for the homo and lumo eigenvalues of H . 

The algorithm requires some information that should be collected during the 
course of the recursive expansion: 

- The parameters Amin and Amax used for the initial normalization. 

- A boolean vector [pi,p2, ■ ■ ■ ,Pn] specifying which polynomial that was used 
in each iteration. 

- A vector [ai, a2, ■ • ■ , o:„] containing the acceleration or scaling parameters 
used in each iteration. 

- A vector [vi,V2j ■ ■ ■ , Wn] containing the Frobenius norms of X — X^ for each 
iteration. 

- A vector [wi,W2, ■ ■ ■ ,Wn] containing the traces of X — A^ for each iteration. 

All this information can be extracted from the recursive expansion iterations at 
practically no extra cost, see Alg. [2j 

5.3. Propagation of eigenvalue estimates. In the previous section, we devel- 
oped a method to extract bounds for the homo and lumo eigenvalues from the 
recursive polynomial expansion. This means that the bounds are available only 
when the expansion has already completed. In order to accelerate the recursive 
expansion the bounds are needed in advance. However, as discussed in Section [2. 1[ 
the recursive expansions are often used repeatedly for a sequence of effective Hamil- 
tonian matrices Hq, Hi, . . . . It is then possible to translate the eigenvalue bounds 
from one iteration to the next [55] . 

Let Ai < A2 < • • • < Aat be the eigenvalues of and Ai < A2 < • ■ • < Aat the 
eigenvalues of Hi . Then, by Weyl's theorem on eigenvalue movement [62] , 

(17) |A, -A,| < ||if,-i/,_i||2, J = 1,2,..., A. 

Therefore, by expanding the intervals containing the homo and lumo eigenvalues 
of Hi_i with the norm \\Hi — -ffi_i||2, we obtain intervals containing the homo 
and lumo eigenvalues of Hi. If the spectral norm cannot be readily obtained, the 
Frobenius norm can be used, see ^. If the norm is small (as is typically the 
case between SCF cycles), the acceleration of the following recursive expansion is 
hardly affected by this translation. If the norm is large (as is typically the case 
between time steps in a molecular dynamics simulation), the acceleration will not 
be optimal, but we have ensured the robustness of the expansion with respect to 
the homo and lumo estimates. 

6. Born-Oppenheimer molecular dynamics 

Among computationally feasible methods for molecular dynamics simulations, 
Born-Oppenheimer molecular dynamics based on self-consistent field methods, such 
as density functional theory [T^l [211 dHl HZ] , is often considered a gold standard. 
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Figure 4. Mapping of eigenvalues by the accelerated expansion. 
Upper panel: Mapping /i2(/ii(- . . /i(A) . . . )) of occupied (red cir- 
cles) and unoccupied (blue crosses) eigenvalues A of Xq. Lower 
panel: Absolute values of the eigenvalues of X12 — as a 
function of the eigenvalues of X^). The solid red lines and the 
dashed blue lines represent upper and lower bounds, respectively, 
of \\X12 — -^i2l|2, mapped back to the eigenspectrum of to 
give information about the homo and lumo eigenvalues. The in- 
ner bounds for the homo and lumo eigenvalues, respectively, are 
both valid. In the twelfth iteration the outer bound for the lumo 
eigenvalue also holds. The outer bound for the homo eigenvalue, 
however, is not valid. 



In Born-Oppenheimer molecular dynamics the forces acting on the atoms are ex- 
plicitly quantum mechanical and calculated at the relaxed electronic ground state 
[33]. The electronic ground state is given through the self-consistent field opti- 
mization procedure, discussed in Section 2.1 which involves iterative solutions of 
the density matrix and Hamiltonian in ([l|. The self-consistent field optimization 
accounts for details in the charge distribution and is important for the accuracy of 
the interatomic forces. 
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Algorithm 3 Eigenvalue computation 

1: i = n 

2: Xi — I, X2 — 1, 2:3 = 0, a;4 = 

3: while < 7 — 7^ do 

5: y2 = i (1 - \ /r^4^ ) 

6: ys = 5 (1 + \/l - 4^4 

8: for j = i, « — 1, . . . , 2, 1 do 

9: if pj then 

^g. 2/fc = Vy*=' 

= (yfc - 1 + aj)/Q!j, fc = 1,2,3,4 

11: else 

^2. 2/fe = 1 - VI - 

yk = yk/oLj, fc = l,2, 3,4 

13: end if 

14: end for 

15: Xfc = min(xfc,?;fc), fc = 1,2 

16: Xfe = max(a;fc,?;fc), /c = 3,4 

17: i = i - 1 

18: end while 

19: '^homo ~ -^max (•^max '^min)2;4 

20: ^j^Qj^-^Q — Ajiiax (-^max '^min)'^3 

21: ^Iqj^^o — '^max (-^max ^min)*^2 

22: ^i^j^^Q — '^max (-^max '^min)*^! 



In regular Born-Oppenheimer molecular dynamics the computational cost, which 
is dominated by the iterative self-consistent field optimization, is significantly re- 
duced by using a good initial guess to the optimization procedure, which is nat- 
urally provided through an extrapolation of the ground state solutions from pre- 
vious time steps. Since the iterative self-consistent field optimization in practice 
never is complete and always approximate, this extrapolation technique generates a 
time-dependent fictitious dynamics of the underlying electronic degrees of freedom. 
Because of the non-linearity of the self-consistent field optimization this process is 
irreversible, which typically is manifested in an unphysical systematic drift in the 
energy and phase space [l^l [SD] , where the electronic degrees of freedom act like 
a heat sink or source, constantly removing or adding energy to the system. Only 
by increasing the number of self-consistent field iterations can the energy drift be 
reduced, though it never fully disappears. Recently an extended Lagrangian for- 
mulation of Born-Oppenheimer molecular dynamics was introduced that overcomes 
these fundamental problems by restoring time-reversibility and symplecticity to the 
dynamics |M1 HOI ESI ISS EZ] ■ 

Instead of using an extrapolation of the electronic ground state from previous 
time steps, as in regular Born-Oppenheimer molecular dynamics, an auxiliary elec- 
tron density n(r) is introduced in the extended Lagrangian formulation as an ex- 
tended dynamical variable that is centered around the ground state density p{y) 
through a harmonic well. The equations of motion for the nuclear coordinates 
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R = {i?/} and the extended density are given by 
(19) n(r) = a;2 (p(r) - n(r)) , 

where Mj are the nuclear masses and a; is a frequency parameter that determines 
the curvature of the harmonic weU. J7(R; p) is the potential energy term that 
describes the electronic and ionic energy at the self-consistent electronic ground 
state. The dots denote time-derivatives. 



The extended Lagrangian molecular dynamics, (18) and (19 1, has several impor- 
tant features: i) n{r) moves in a harmonic well centered around the ground state 
density p(r) and is therefore a good initial guess to the iterative self-consistent field 
optimization procedure, ii) since n(r) is a dynamical variable, in contrast to p(r), it 
can be integrated with a geometric reversible scheme in the same way as the nuclear 
coordinates |30l 1381 143] , in ) in this way reversibility in the underlying propagation 
of the electronic degrees of freedom is not broken, and iv) since the equations of mo- 
tion for the nuclear coordinate is identical to "exact" Born-Oppenheimer molecular 
dynamics, the total Born-Oppenheimer energy, 

(20) Etot = lY.^^i^^i + ^(^'^P)^ 

I 

is still a constant of motion that should exhibit long-term stability [3S| . 

Since our estimates of the homo-lumo eigenvalues, which are used for the scale- 
and-fold acceleration in the construction of the density matrix, are given from pre- 
vious time steps, it is important to show that this additional artificial propagation 
does not have any effect on the reversibility and long-term energy conservation. 
Here we will demonstrate how the accelerated SP2 density matrix expansion using 
estimates of the homo-lumo eigenvalues from the previous time step is fully com- 
patible with extended Lagrangian Born-Oppenheimer molecular dynamics, without 
causing any systematic drift in the energy. Our accelerated expansion algorithm 
therefore provides a significant reduction of the computational cost, while keeping 
the accuracy, for an already highly efficient scheme. 

7. Numerical experiments 



In our numerical experiments the interaction potential ?7(R; p) in (181 is based 
on the self-consistent density functional tight-binding (SCC-DFTB) formulation of 
density functional theory [131 (TSl [HI HZl as implemented in the program LATTE 
[SI [21 [321 mi [5H] ■ The equations of motion are integrated with the Verlet scheme, in 
its symplectic velocity formulation for the nuclear degrees of freedom and using a 
regular Verlet scheme for the electronic degrees of freedom with a dissipative force 
term that removes any accumulation of numerical noise [40l [61] . In all cases the 
SP2 algorithm, with (Alg. [2]) or without (Alg. [T]) acceleration, was used with the 
recursions terminated when the occupation change, as measured by the difference 
between using X'^ or 2X — X'^, was less than 10"^". Two self-consistent ffeld 
iterations were used in each time step, which together with the force calculation 
sums up to totally three density matrix constructions per time step. The homo and 
lumo eigenvalue bounds were translated from one time step or self-consistent field 
iteration to the next as described in Section |5.3[ using the Frobenius matrix norm 
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Figure 5. The conservation of the total energy, (20), for regular 
Born-Oppenheimer molecular dynamics (BOMD) and extended 
Lagrangian Born-Oppenheimer molecular dynamics (XL-BOMD), 
with and without the acceleration scheme (ACC) for the SP2 algo- 
rithm. The two lower curves (red and blue) are virtually identical. 



in (17). All calculations were performed using regular dense (non-sparse) matrix 
algebra. 

Figure [5] shows the stability of the total Born-Oppenheimer energy, i?tot in 
( 20 ) , either with regular Born-Oppenheimer molecular dynamics using the elec- 
tronic ground state solution from the previous time step as an initial guess to the 
self-consistent field optimization or with extended Lagrangian Born-Oppenheimer 
molecular dynamics, (18) and (19), with or without the acceleration scheme. The 
systematic drift occurring in regular Born-Oppenheimer molecular dynamics van- 
ish in the extended Lagrangian formulation. Moreover, we find virtually no effect 
in the total energy from using the accelerated SP2 scheme with estimates of the 
homo and lumo eigenvalues obtained from the previous iteration. The two graphs 
are practically identical even after thousands of time steps. 

The gain in efficiency using the acceleration technique is significant. Figure [6] 
shows the average number of matrix-matrix multiplications required in the density 
matrix expansion in molecular dynamics simulations of various systems. For the 
simulation of the polyethene molecule we chose a 100 carbon atom chain (C100H202) 
and the methane example consisted of liquid methane (CH4)ioo. All the simulations 
were performed with a time step of 0.5 fs using a structurally optimized ground state 
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Figure 6. The average number of matrix-matrix multiplications 
of the SP2 algorithm during molecular dynamics simulations with 
and without acceleration. 



configuration with an initial Gaussian velocity distribution corresponding to 550 K. 
The efficiency gain is governed by the size of the homo-lumo gap [52j . For small gap 
systems, like naphthalene, the scale-and-fold acceleration gives a significantly higher 
reduction of the cost compared to large gap systems, like methane. In other words, 
the acceleration works best for ill-conditioned systems, as defined in Section [2] 
This also means that there is a much smaller variation in the number of matrix 
multiplications required for the accelerated scheme. The work load is therefore not 
only smaller but also less sensitive to the particular choice of materials system. This 
also means that the computational cost is more evenly distributed over time in a 
molecular dynamics simulation when there are large variations in the homo-lumo 
gap. 

Figure [7] (panel a-d) illustrates the effect of the acceleration as a function of 
simulation time for a system of liquid isocyanic acid. As seen in the two upper 
panels (a and b) the energetics between the regular and the accelerated scheme, 
as measured by the total energy or the temperature, are indistinguishable. Figure 
[t] (panel d) also shows how the number of matrix multiplications for the density 
matrix construction (that is required in the last self-consistent field iteration in each 
time step) varies over time. Variations in the work load may be sensitive to the 
choice of convergence criterion used to terminate the SP2 recursions, which here is 
set fairly tight. The accuracy of the homo-lumo gap estimate, shown in panel (c) is 
surprisingly good, in particular as measured by the inner (Inner) x'^^^^ and Aj^^^^ 
eigenvalues. The upper bound of the gap, given by the outer (Outer) eigenvalue 
bounds, is good but not as tight. The reason is that for the outer bounds we choose 
the most conservative values to make sure that we get valid bounds. It is in principle 



18 



E. H. RUBENSSON AND A. M. N. NIKLASSON 




Figure 7. The fluctuations during a simulation run of liquid iso- 
cyanic acid {HCNO)zq of the total energy (Etot in panel a), the 
temperature (Temp in panel b), the gap estimates (panel c), and 
the variation of the number of matrix-matrix multiplications (panel 
d) of the SP2 algorithm with and without acceleration (ACC). 

possible to devise a scheme that keeps track of which eigenvalue (the homo or lumo 
eigenvalue) that correspond to the spectral norm oi X — X'^. This information 
can be used to obtain more accurate outer honio-lumo bounds. However, such an 
algorithm would be more complicated with small gains in efficiency. 

8. Discussion 

In this article we have devised a surprisingly simple technique to extract informa- 
tion about the homo-lumo gap that enables a cost-efficient practical application of 
the scale-and-fold acceleration technique in molecular dynamics simulations. The 
technique is quite general and should be applicable to a broad class of recursive den- 
sity matrix expansion algorithms besides the SP2 scheme described here 
Scale-and-fold acceleration is not the only way to improve or design efficient re- 
cursive expansion polynomials. It is in principle possible to construct a number 
of "optimal" recursive expansion schemes where the projection polynomials, {/n}, 
in the recursive expansion Q are defined, for example, by minimizing the error in 
some given norm for a certain number of recursion steps, N, and with a maximum 
number. A/max, of necessary matrix multiplications, M, e.g. 

(21) {/J^o = argmin{|j0(^/ - H) - /^(/jv-i(. ■ • MH) . . .))|l;i\f < AW} ■ 
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With a detailed knowledge about the particular eigenvalue spectrum of H this prob- 
lem can be solved using a number of various non-linear optimization techniques. 
The key issue with such an optimized recursion scheme, (21 1, is that we in general 
do not have prior knowledge of the interior eigenvalue distribution. Moreover, in 
a molecular dynamics simulation or a self-consistent field optimization the opti- 
mization would have to be performed on-the-fly in each iteration. Alternatively, 
optimizations for different classes of distributions could be made, e.g. for uniform 
eigenvalue distributions with various homo-lumo gaps and spectral bounds, which 
possibly could be parametrized. Our accelerated SP2 recursion scheme may very 
well belong to such a hypothetical class of optimized expansion methods. 

In this article we have taken a cautious standpoint with respect to the homo and 
lumo estimates used in the acceleration. Although the inner bounds of the homo 
and lumo eigenvalues have proven to be extremely tight, we have used the more 
conservative outer bounds for the acceleration. Moreover, those outer bounds have 
been moved further away from the gap as a result of the propagation between time 
steps or self-consistent field iterations by (17). All those measures are not strictly 
necessary in order to avoid mixing up occupied and unoccupied states and lead 
to a less efficient acceleration. However, we also have to avoid folding extremal 
eigenvalues into the gap since this could lead to invalid homo and lumo eigenvalue 
bounds in the following iterations. With the conservative measures taken in this 
work this will never happen, though in practice it is often possible to use less 
restrictive homo-lumo estimates leading to a more efficient acceleration. 



9. Conclusions 

We have proposed a scheme to accelerate the construction of the density matrix 
in quantum mechanical molecular dynamics simulations. Our technique is based on 
the scale-and-fold acceleration method with estimates of the humo-lumo gap that 
are automatically acquired from the recursive expansion in the previous time step 
or self-consistent field iteration at a negligible computational cost. Our method was 
illustrated with density functional tight-binding molecular dynamics simulations, 
where the computational effort is dominated by the density matrix construction. 
Our scheme was found to be fully compatible with extended Lagrangian Born- 
Oppenheimer molecular dynamics and long-term energy stability. As a stand-alone 
technique, our method for calculating the homo-lumo gap estimates may also be 
useful in studies of materials properties. 

The experiments presented in this work were all carried out using dense matrix 
algebra. The use of sparse matrix algebra to achieve a linear scaling algorithm 
involves additional issues regarding ways to bring about matrix sparsity. Recursive 
expansions with rigorous error control requires homo and lumo eigenvalue estimates 
|55) . Fortunately, this is the same information that is needed for the acceleration. 
Thus, the eigenvalue estimates of the present work will also be useful in schemes 
to select small matrix elements for removal to achieve error control in the linear 
scaling construction of the density matrix. 

We identified two different phases in recursive density matrix expansions, the 
conditioning phase and the purification phase. With this point of view it was 
clear that the acceleration technique represents an improvement of the condition- 
ing phase. In this way, the acceleration is analogous to preconditioners used with 
iterative solvers for linear systems. By using prior knowledge of the system, the 
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condition number of the problem is more rapidly reduced in the (pre-)conditioning 
phase, which gives a significant reduction of the computational cost, as demon- 
strated in quantum mechanical molecular dynamics simulations. 
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